Appendix B

CASE‐STUDY 2: SIMULATION SCENARIOS

Authors
Affiliations

Julie Vercelloni

Australian Institute of Marine Science, Townsville, Australia

Centre for Data Science, Queensland University of Technology, Brisbane, Australia

Murray Logan

Australian Institute of Marine Science, Townsville, Australia

Andrew Zammit‐Mangion

School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, Australia

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

Matthew Sainsbury‐Dale

School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, Australia

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

Britta Schaffelke

Australian Institute of Marine Science, Townsville, Australia

Kerrie Mengersen

Centre for Data Science, Queensland University of Technology, Brisbane, Australia

Manuel González‐Rivero

Australian Institute of Marine Science, Townsville, Australia

Important

To be displayed on a big screen.

Introduction to synthos

The R package synthos GitHub generates synthetic data with the primarly aim to test new methods and sampling designs, in a controlled enviroment. The data are created from spatio-temporal dependency structures mimicing realistic baselines, population dynamics, disturbance regimes and stochastic processes.

The generation of synthetic data is based on four main steps:

  • Generation of the spatio-temporal domains: the first step consists to generate virtual coral reefs across a spatial domain.
  • Generation of the disturbances: three disturbance types are generated (Heat Stress events, Cyclones and “Other”) using a mechanistic approach that generates different spatial and temporal footprints across the spatial domain.
  • Generation of baselines: representing values in coral cover the year prior the first sampling.
  • Generation of sampling designs: the user selects the number of monitoring locations and surveyed years and additional spatial scales.

Coral cover values are generated by combining baseline values, disturbance effects, and growth rates, and then projecting their cumulative effects across the spatial domain. The sampling design specifies the number of observations and years to be selected within a domain, creating the observed dataset. Models are then fitted to these observed data, and their predictive performance is evaluated at locations that are not monitored but for which true values are known.

Synthos modelling pipeline

We developed a modelling pipeline to test the predictive capaiblities of the spatio-temporal FRK model under different sampling designs. Four scenarios are explored with varying number of monitored reefs and years (Table 1).

Table 1: Details of the simulation scenarios.
simulation reef_number year_number
High-resolution 25 15
Medium-resolution 15 15
Low-resolution 5 15
Temporally-sparse 50 2-15

Other settings were kept constant across the simulation scenarios. The coral baseline was generated using a west–east gradient of coral cover, consistent with patterns observed on the Great Barrier Reef. The relative influence of disturbances on coral dynamics was set to 60% for heat stress events, 39% for cyclones, and 1% for other factors. Coral growth was fixed at 30%, with a possible coverage range from 1% to 70%. The sampling design emulates the monitoring program of the Australian Institute of Marine Science, where data are collected within each reef from 2 sites, 5 transects per site, 100 photo frames per transect. Finally, point-based observations are generated to mimic the outputs of the machine learning system used in the ReefCloud platform, with 50 points per frame automatically classified.

The spatial domain is divided into a 0.1° tessellation grid, from which tier-level disturbance values are extracted and coral cover values are averaged for use in the spatio-temporal model. The predictive performances of the spatio-temporal for each scenario is tested using four predictive measures (see details below). These metrics are calculated using model predictions at predictive-tiers where true values are known.

Additional results of synthetic data vizualisation, predictions across multiple spatial scales, uncertainty, model goodness-of-fit, and disturbances are presented below.

Monitoring locations

[[1]]

Figure 1: Regional distribution of the synthetic reefs, where red dots indicate monitoring locations and hexagons represent tier boundaries. A tier is considered a data-tier if it contains a red dot; otherwise, it is classified as a new-tier for the scenario High-resolution

[[1]]

Figure 2: Regional distribution of the synthetic reefs, where red dots indicate monitoring locations and hexagons represent tier boundaries. A tier is considered a data-tier if it contains a red dot; otherwise, it is classified as a new-tier for the scenario Low-resolution

[[1]]

Figure 3: Regional distribution of the synthetic reefs, where red dots indicate monitoring locations and hexagons represent tier boundaries. A tier is considered a data-tier if it contains a red dot; otherwise, it is classified as a new-tier for the scenario Medium-resolution

[[1]]

Figure 4: Regional distribution of the synthetic reefs, where red dots indicate monitoring locations and hexagons represent tier boundaries. A tier is considered a data-tier if it contains a red dot; otherwise, it is classified as a new-tier for the scenario Temporally-sparse

Data visualization

[[1]]

Figure 5: Generated coral cover (%) values by transect within each reef, with colours representing different sites for the scenario High-resolution

[[1]]

Figure 6: Generated coral cover (%) values by transect within each reef, with colours representing different sites for the scenario Low-resolution

[[1]]

Figure 7: Generated coral cover (%) values by transect within each reef, with colours representing different sites for the scenario Medium-resolution

[[1]]

Figure 8: Generated coral cover (%) values by transect within each reef, with colours representing different sites for the scenario Temporally-sparse

[[1]]

Figure 9: Temporal patterns of mean coral cover (%) by site within reef. White gaps show years when reefs were not monitored as part of the scenario High-resolution

[[1]]

Figure 10: Temporal patterns of mean coral cover (%) by site within reef. White gaps show years when reefs were not monitored as part of the scenario Low-resolution

[[1]]

Figure 11: Temporal patterns of mean coral cover (%) by site within reef. White gaps show years when reefs were not monitored as part of the scenario Medium-resolution

[[1]]

Figure 12: Temporal patterns of mean coral cover (%) by site within reef. White gaps show years when reefs were not monitored as part of the scenario Temporally-sparse

Model prediction

[[1]]

Figure 17: Predictions of coral cover (%) in High-resolution

[[1]]

Figure 18: Predictions of coral cover (%) in Low-resolution

[[1]]

Figure 19: Predictions of coral cover (%) in Medium-resolution

[[1]]

Figure 20: Predictions of coral cover (%) in Temporally-sparse

Model uncertainty

[[1]]

Figure 21: Uncertainty associated with predicted coral cover values in High-resolution

[[1]]

Figure 22: Uncertainty associated with predicted coral cover values in Low-resolution

[[1]]

Figure 23: Uncertainty associated with predicted coral cover values in Medium-resolution

[[1]]

Figure 24: Uncertainty associated with predicted coral cover values in Temporally-sparse

Model attribution

[[1]]

Figure 33: Estimated effect sizes of disturbance exposure (in the logit scale). The points represent the estimated effect and the intervals represent the corresponding 95% confidence intervals, for the scenario High-resolution

[[1]]

Figure 34: Estimated effect sizes of disturbance exposure (in the logit scale). The points represent the estimated effect and the intervals represent the corresponding 95% confidence intervals, for the scenario Low-resolution

[[1]]

Figure 35: Estimated effect sizes of disturbance exposure (in the logit scale). The points represent the estimated effect and the intervals represent the corresponding 95% confidence intervals, for the scenario Medium-resolution

[[1]]

Figure 36: Estimated effect sizes of disturbance exposure (in the logit scale). The points represent the estimated effect and the intervals represent the corresponding 95% confidence intervals, for the scenario Temporally-sparse

Model fit

Figure 37: Model goodness-of-fit showing predicted coral cover versus true values for the prediction-tiers. True values corresponds to the mean coral cover at tier-level.

Cyclone

[[1]]

Figure 38: Relative effects of cyclone exposure per surveyed year for the scenario High-resolution

[[1]]

Figure 39: Relative effects of cyclone exposure per surveyed year for the scenario Low-resolution

[[1]]

Figure 40: Relative effects of cyclone exposure per surveyed year for the scenario Medium-resolution

[[1]]

Figure 41: Relative effects of cyclone exposure per surveyed year for the scenario Temporally-sparse
fig_cyclone

[1] “High-resolution” “Low-resolution” “Medium-resolution” [4] “Temporally-sparse”

Heat stress

[[1]]

Figure 42: Relative effects of heat stress event per surveyed year for the scenario High-resolution

[[1]]

Figure 43: Relative effects of heat stress event per surveyed year for the scenario Low-resolution

[[1]]

Figure 44: Relative effects of heat stress event per surveyed year for the scenario Medium-resolution

[[1]]

Figure 45: Relative effects of heat stress event per surveyed year for the scenario Temporally-sparse

Details on predictive measures

These predictive measures give a single number with low scores representing better performances.

  • 95% coverage interval (CvgErr): evaluates how often predictions include true observations, with the goal of capturing the true values 95% of the time. It is estimated as follows:

\[ \text{CvgErr}(z, \ell, u) \;=\; \left| 0.95 \;-\; \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}\!\left( \ell_i < z_i < u_i \right) \right| \]

where \(z = \{z_1, z_2, \dots, z_n\}\) are the coral cover observations, \(\ell\) and \(u\) are the lower and upper bounds of the predictive intervals, \(n\) the total number of predictions, and \(\mathbf{1}(\cdot)\) is the indicator function (1 if the condition is true, 0 otherwise).

  • 95% interval score (IS): rewards prediction intervals that include the true observations (accuracy) and penalizes those that are too narrow or too wide (precision). It is computed as follow:

\[ \text{IS}_{95} \;=\; \frac{1}{n} \sum_{i=1}^{n} \Bigg[ (u_i - \ell_i) + \frac{2}{\alpha} (\ell_i - y_i)\,\mathbf{1}(y_i < \ell_i) + \frac{2}{\alpha} (y_i - u_i)\,\mathbf{1}(y_i > u_i) \Bigg] \]

where \(\alpha = 0.05\), \(\ell\) and \(u\) are the lower and upper bounds of the predictive intervals, \(n\) the total number of predictions, and \(y\) are observed coral cover.

  • Root-mean-squared prediction error (RMSPE) - how far off model predictions are from true observations without considering for uncertainty.

\[ \text{RMSPE} \;=\; \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 } \]

where \(y\) and \(\hat{y}\) are the observed and predicted coral cover values, respectively, and \(n\) the total number of observations.

  • Continuous Ranked Probability Score (CRPS) - represents the quality of the predictions over the entire predictive probability distribution penalizing predictions that are inaccurate, imprecise or overconfident.

\[ \text{CRPS}(F, y) \;=\; \sigma \left[ z \left( 2 \Phi(z) - 1 \right) \;+\; 2 \,\phi(z) \;-\; \frac{1}{\sqrt{\pi}} \right], \quad z = \frac{y - \mu}{\sigma} \]

where \(y\) is the observed coral cover values, \(\mu\) and \(\sigma\) are the mean and the standard deviation of the predictive normal distribution, \(\phi(.)\) represented the standard normal probability density function and \(\Phi\) the cumulative distribution function.